tensor block model
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- Africa > Senegal > Kolda Region > Kolda (0.04)
- South America > Brazil (0.04)
- (11 more...)
- Health & Medicine > Therapeutic Area > Neurology (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (0.68)
Multiway clustering via tensor block models
We consider the problem of identifying multiway block structure from a large noisy tensor. Such problems arise frequently in applications such as genomics, recommendation system, topic modeling, and sensor network localization. We propose a tensor block model, develop a unified least-square estimation, and obtain the theoretical accuracy guarantees for multiway clustering. The statistical convergence of the estimator is established, and we show that the associated clustering procedure achieves partition consistency. A sparse regularization is further developed for identifying important blocks with elevated means. The proposal handles a broad range of data types, including binary, continuous, and hybrid observations. Through simulation and application to two real datasets, we demonstrate the outperformance of our approach over previous methods.
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- Africa > Senegal > Kolda Region > Kolda (0.04)
- South America > Brazil (0.04)
- (11 more...)
- Health & Medicine > Therapeutic Area > Neurology (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (0.88)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- South America > Brazil (0.04)
- North America > Cuba (0.04)
- (12 more...)
Multiway clustering via tensor block models
We consider the problem of identifying multiway block structure from a large noisy tensor. Such problems arise frequently in applications such as genomics, recommendation system, topic modeling, and sensor network localization. We propose a tensor block model, develop a unified least-square estimation, and obtain the theoretical accuracy guarantees for multiway clustering. The statistical convergence of the estimator is established, and we show that the associated clustering procedure achieves partition consistency. A sparse regularization is further developed for identifying important blocks with elevated means.
Heteroskedastic Tensor Clustering
Tensor clustering, which seeks to extract underlying cluster structures from noisy tensor observations, has gained increasing attention. One extensively studied model for tensor clustering is the tensor block model, which postulates the existence of clustering structures along each mode and has found broad applications in areas like multi-tissue gene expression analysis and multilayer network analysis. However, currently available computationally feasible methods for tensor clustering either are limited to handling i.i.d. sub-Gaussian noise or suffer from suboptimal statistical performance, which restrains their utility in applications that have to deal with heteroskedastic data and/or low signal-to-noise-ratio (SNR). To overcome these challenges, we propose a two-stage method, named $\mathsf{High\text{-}order~HeteroClustering}$ ($\mathsf{HHC}$), which starts by performing tensor subspace estimation via a novel spectral algorithm called $\mathsf{Thresholded~Deflated\text{-}HeteroPCA}$, followed by approximate $k$-means to obtain cluster nodes. Encouragingly, our algorithm provably achieves exact clustering as long as the SNR exceeds the computational limit (ignoring logarithmic factors); here, the SNR refers to the ratio of the pairwise disparity between nodes to the noise level, and the computational limit indicates the lowest SNR that enables exact clustering with polynomial runtime. Comprehensive simulation and real-data experiments suggest that our algorithm outperforms existing algorithms across various settings, delivering more reliable clustering performance.
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.14)
- Asia > China (0.04)
- North America > Mexico (0.04)
- (6 more...)
- Health & Medicine > Therapeutic Area > Neurology (0.45)
- Health & Medicine > Health Care Technology (0.45)
Smooth tensor estimation with unknown permutations
Higher-order tensor datasets are rising ubiquitously in modern data science applications, for instance, recommendation systems (Baltrunas et al., 2011; Bi et al., 2018), social networks (Bickel and Chen, 2009), genomics (Hore et al., 2016), and neuroimaging (Zhou et al., 2013). Tensor provides effective representation of data structure that classical vector-and matrix-based methods fail to capture. One example is music recommendation system (Baltrunas et al., 2011) that records ratings of songs from users on various contexts. This three-way tensor of user song context allows us to investigate interactions of users and songs in a context-specific manner. Another example is network dataset that records the connections among a set of nodes. Pairwise interactions are often insufficient to capture the complex relationships, whereas multi-way interactions improve the understanding of networks in molecular system (Young et al., 2018) and social networks (Han et al., 2020). In both examples, higher-order tensors represent multi-way interactions in an efficient way. Tensor estimation problem cannot be solved without imposing structures. An appropriate reordering of tensor entries often provides effective representation of the hidden salient structure.
- North America > United States > Illinois > Cook County > Chicago (0.05)
- Africa > Senegal > Kolda Region > Kolda (0.04)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- Media > Music (0.68)
- Information Technology > Services (0.54)
- Law > Criminal Law (0.47)
- (3 more...)
Exact Clustering in Tensor Block Model: Statistical Optimality and Computational Limit
Han, Rungang, Luo, Yuetian, Wang, Miaoyan, Zhang, Anru R.
High-order clustering aims to identify heterogeneous substructure in multiway dataset that arises commonly in neuroimaging, genomics, and social network studies. The non-convex and discontinuous nature of the problem poses significant challenges in both statistics and computation. In this paper, we propose a tensor block model and the computationally efficient methods, \emph{high-order Lloyd algorithm} (HLloyd) and \emph{high-order spectral clustering} (HSC), for high-order clustering in tensor block model. The convergence of the proposed procedure is established, and we show that our method achieves exact clustering under reasonable assumptions. We also give the complete characterization for the statistical-computational trade-off in high-order clustering based on three different signal-to-noise ratio regimes. Finally, we show the merits of the proposed procedures via extensive experiments on both synthetic and real datasets.
- Asia > China (0.04)
- Africa > Senegal > Kolda Region > Kolda (0.04)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- (2 more...)
- Transportation > Passenger (1.00)
- Transportation > Air (1.00)
- Consumer Products & Services > Travel (1.00)
- (4 more...)
Multiway clustering via tensor block models
We consider the problem of identifying multiway block structure from a large noisy tensor. Such problems arise frequently in applications such as genomics, recommendation system, topic modeling, and sensor network localization. We propose a tensor block model, develop a unified least-square estimation, and obtain the theoretical accuracy guarantees for multiway clustering. The statistical convergence of the estimator is established, and we show that the associated clustering procedure achieves partition consistency. A sparse regularization is further developed for identifying important blocks with elevated means.
Multiway clustering via tensor block models
We consider the problem of identifying multiway block structure from a large noisy tensor. Such problems arise frequently in applications such as genomics, recommendation system, topic modeling, and sensor network localization. We propose a tensor block model, develop a unified least-square estimation, and obtain the theoretical accuracy guarantees for multiway clustering. The statistical convergence of the estimator is established, and we show that the associated clustering procedure achieves partition consistency. A sparse regularization is further developed for identifying important blocks with elevated means. The proposal handles a broad range of data types, including binary, continuous, and hybrid observations. Through simulation and application to two real datasets, we demonstrate the outperformance of our approach over previous methods.
- North America > United States > Wisconsin > Dane County > Madison (0.14)
- Africa > Senegal > Kolda Region > Kolda (0.04)
- South America > Brazil (0.04)
- (10 more...)
- Health & Medicine > Therapeutic Area > Neurology (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (0.88)